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Abstract 

Consider a binary additive noise channel with noiseless feedback. When the noise is a stationary and ergodic process Z, the 
capacity is 1 — H(Z) (H(-) denoting the entropy rate). It is shown analogously that when the noise is a deterministic sequence z°°, 
the capacity under finite-state encoding and decoding is 1 — p(z°°), where p(-) is Lempel and Ziv's finite-state compressibility. 
This quantity is termed the porosity <r(-) of an individual noise sequence. A sequence of schemes are presented that universally 
achieve porosity for any noise sequence. These converse and achievability results may be interpreted both as a channel-coding 
counterpart to Ziv and Lempel's work in universal source coding, as well as an extension to the work by Lomnitz and Feder and 
Shayevitz and Feder on communication across modulo-additive channels. Additionally, a slightly more practical architecture is 
suggested that draws a connection with finite-state predictability, as introduced by Feder, Gutman, and Merhav. 
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Index Terms 

Lempel-Ziv, universal source coding, universal channel coding, modulo-additive channel, compressibility, predictability 

I. Introduction 

The "core" results of information theory, starting with Shannon's source and channel coding theorems, are concerned 
with probabilistic systems of a known model: an iid Bernoulli (1/4) source must be compressed, or perhaps bits are to be 
1 communicated across an AWGN channel of known SNR. One may seek additional generality by asking that a coding scheme 
H H . simultaneously function for an entire class of such probabilistic models. In the case of source coding, Ziv and Lempel flTJ , 
c/3 ' (2) and Ziv Q take this question to its logical extreme and ask that a compressor not only achieve the optimal rate for any 
, probabilistic source model, but do so for any individual source sequence. In Q, it is discovered that the traditionally relevant 

probabilistic measurement — entropy rate — generalizes into a measure for an individual sequence — compressibility. In this 
I , paper, an analogous set of questions yield an analogous set of answers in the context of noisy channel coding with feedback. 
Historically, far more attention has been paid to the issue of universality in source coding than in channel coding. The source 
, of this discrepancy is readily apparent from Figs. Q~]and |2] The encoder of Fig. [2] never observes the noise sequence in any 

■ way, and so its codebook cannot be dynamically customized to suit the channel. The source encoder of Figure [1] on the other 
! hand has direct access to the source sequence and can therefore adjust to its statistics. As such, the degree of universality that 

■ can be requested in the classical channel-coding setup is far more restricted than in source coding. Certainly, this does not 
\ preclude discussion of "universality," but the term must take on a considerably looser meaning, as is discussed in Sec. [TTJ 

£NJ ■ The playing field is considerably leveled by introducing a noiseless feedback link, as in Fig. [3] In particular, the modulo- 
~~ * \ additive channel of Fig. [4] allows for a clear and precise analogy to the universal source coding of Lempel and Ziv. To highlight 
1 some of the parallels: 

• Individual sequences. In the source coding setting, Lempel and Ziv replace the source random process X°° with a 
deterministic sequence x°°. Here, the standard stochastic description of channel noise Z°° is supplanted by a specific 

& • individual sequence z°°. 

• Finite state constraint. Lempel and Ziv ask the question: how well can a source encoder/decoder perform for a specific 
individual sequence, if the engineer designing the encoder/decoder knows the sequence ahead of time? Clearly, if the 
encoder and decoder are unconstrained, this problem trivializes: one may design a decoder that, with absolutely no input 
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Fig. 1. The model for universal source coding. An unknown source is provided to an encoder, which must describe it to the decoder. 
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Fig. 2. The model for universal channel coding. A message must be communicated over an unknown channel. 
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Fig. 3. Universal channel coding with noiseless feedback. 
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Fig. 4. An additive-noise channel with noiseless feedback. 



from the encoder, produces xt at time t. A finite-state requirement — which reflects the constraints of the real world — is 
therefore introduced for both encoder and decoder. Within this class of schemes, Lempel and Ziv provide a (non-trivial) 
converse. 

Similarly, if one permits arbitrary encoders and decoders for the individual noise sequence channel coding setting, the 
maximum communication rate of log \X\ may be easily attained for any noise sequence z°°. One merely sets the encoder's 
output to the difference between the source sequence and the noise sequence M°° — z°°. The channel cancels out the 
noise, and the decoder needs merely read the channel output to obtain the message M°°, To avoid such triviality, a 
finite-state encoder/decoder constraint analogous to that of Lempel and Ziv is introduced. Within this class of schemes, 
a non-trivial converse is proven. 

• In traditional Shannon theoretic results, converses that apply for a particular source/channel model are accompanied 
by achievability schemes that function for that particular model. For instance, given a BSC with crossover probability 
p, Shannon's channel coding converse tells us no reliable sequence of coding schemes can achieve a rate better than 
1 — hb(p), and Shannon provides a rate (1 — ft,/,(p))-achieving sequence of codebooks customized for the BSC(p). One 
might similarly ask Lempel and Ziv that to accompany their converse for a particular source sequence, they provide an 
achievability scheme for that particular sequence. The achievability they provide, however, goes several steps further: it 
functions for any individual source sequence. Note that while this scheme that is suggested is infinite-state, it is also 
shown that by truncation and repetition, achievability is also possible through a sequence of finite-state source-coding 
schemes. 

Similarly, both the infinite-state scheme of Lomnitz and Feder [4] and the sequence of finite-state achievability schemes 
J 7 " 1 presented in Sec. I Villi achieve the channel-coding converses for *any* noise sequence. 

• The compressibility p(x°°) of a sequence, introduced by Lempel and Ziv, is its best possible compression rate. The 
analogous quantity in the channel-coding case is here referred to as the channel's porosity a(z°°). In the binary additive 
noise case it is demonstrated to be equal to 1 — p{z°°). Both are analogues of probabilistic quantities — entropy rate 
in the case of compressibility, and one minus the entropy rate in the case of porosity. Both may also be interpreted as 
generalizations of their probabilistic analogues, since according to Thm. 4 in [2|, p(X°°) = H(X) with probability 1 if 
X°° is an ergodic source. 

To summarize, we show both that the porosity of the noise is the best possible rate achievable within the class of finite-state 
schemes, and that there exist a sequence of finite-state schemes that simultaneously achieves porosity for all noise sequences. 

II. Related Work 

Lomnitz and Feder introduce the notion of competitive universality to channel coding in J4]. The reference class used in this 
comparison consists of iterated fixed-blocklength (IFB) schemes, which ignore the feedback channel and simply employ block 
coding across the noisy channel. Rate-adaptive schemes, on the other hand, make arbitrary use of feedback and communicate a 
fixed number of bits over at most n channel uses. It is proven that IFB schemes can do no better than porosity (rate 1 — ~p(z°°)), 
and a rate-adaptive scheme built upon LZ78 is shown to achieve porosity. 

In a sense, the results reported here take these statements of competitive optimality a step further: we ask that the achievability 
schemes not only outperform any elements of the reference class, but that they are elements of the reference class. This 
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establishes porosity as a channel capacity of sorts. As IFB codes frequently cannot even achieve porosity for a given noise 
sequence z°°, let alone for the entire set of noise sequences, this requires that the reference class be widened to the class of 
all finite-state schemes. 

The porosity-achieving rate-adaptive scheme introduced by Lomnitz and Feder does not quite fall into this class, as it consists 
of infinite states. One might consider consider truncating and repeating it in order to construct a finite state scheme. While this 
could potentially work, we find it somewhat easier to build from the schemes of Shayevitz and Feder [5 1, whose performance 
guarantees mesh well with the asymptotic performance metrics of interest here. 

In 0, Shayevitz and Feder establish the initial results that have sparked much of the subsequent work in this problem. An 
extremely general family of channels is considered, but the results provided are most meaningful when restricting attention to 
the individual additive noise sequence setting. Of interest are two things: the construction of variable-rate, fixed-blocklength 
schemes expanded from Horner's coding method, and the strong performance guarantees that are provided. The schemes are 
shown to achieve for any noise sequence the empirical capacity, or one minus the first-order empirical entropy. By operating 
this coding technique over blocks of channel use, one can potentially generalize to arbitrary-order empirical entropies. The 
achievability schemes {J 7 ™} presented in this paper are an extension of this idea. 

As with both [|4] and 0, Eswaran et al. J6] consider a very broad class of channels with noiseless feedback, but the 
results provided are most meaningful in the modulo-additive setting with an individual noise sequence. Extending Q, it is 
demonstrated that even when the feedback is asymptotically zero-rate, the empirical capacity is still universally achievable. 

As previously mentioned, even in the absence of a feedback link questions of universal channel coding can be considered. 
The principal complication in this setting is that the encoder no longer has any information about the specific channel, and so 
neither the rate nor the transmission methodology can be adapted. 

One may nonetheless ask that the decoder adapt to the channel. Csiszar and Komer [7| consider the class of memoryless 
DMC's that share a common input alphabet X — call this class A(X). For a randomly generated codebook, a universal decoder 
for the entire class A(X) is constructed and its performance compared to that of a decoder customized for whichever specific 
channel happens to appear (maximum likelihood decoder). The universal decoder is not only shown to match the ML decoder 
in terms of vanishing error, but it is also found to achieve the same error exponent. 

Ziv |8| and Lapidoth and Ziv |9] seek to expand such a result into the territory of channels with memory. Each considers a 
fairly specific form of memory: finite-state channels with deterministic state transitions [8| and those with probabilistic state 
transitions |9|. Each also demonstrates achievability through decoding schemes that utilize LZ78-style sequence parsing. Feder 
and Lapidoth 1 10] on the other hand consider the more general problem of decoding for a parametric family of channels. 

Despite the non-adaptability of the encoder in this feedback-less setting, one may seek to maximize the worst-case rate of 
communication across the channel. The fundamental limit of performance in this scenario is the compound channel capacity, 
discussed at length in the review article by Lapidoth and Narayan ifTTI . 

A generalization of the above is to take a broadcast approach, wherein channel uncertainty is modeled by having the encoder 
broadcast across all channels in the class considered. The rate region of this broadcast channel then characterizes the rates the 
encoder may simultaneously achieve for each potential channel. Observe that if this rate region can be specifically determined, 
it answers all possible questions of universal decoding. Shamai and Steiner lfl2l leverage this approach for the case of fading 
channels. 

III. Structure of Paper 

In Sec. [TV] a precise description is provided of the problem setting, the class of finite-state schemes, and the relevant 
performance metrics. Sec. [V] builds slightly on Lempel and Ziv's definition of compressibility and establishes certain useful 
properties. Sec. [VI] states the three theorems that constitute the core results of this work. Sec. IVH I contains the proof of the 
converse theorems, and Sec. IVIIII proves achievability. Sec. |IX] introduces a significantly more practical (but sub-optimal) set 
of schemes that establish a connection between porosity and finite-state predictability. Sec. IXl summarizes this paper's findings. 
A few lemmas have somewhat distracting proofs that are relegated to the appendices. 

IV. Problem setup 

A deterministic additive noise feedback channel, as depicted in Fig. |4] is defined by a noise sequence z°° £ X°° , where X 
is a finite alphabet with a modulo-addition operator. The channel output at any time i is given by the sum of the noise and 
the input: yi = xi + Zj. Noiseless feedback Ui — yi~i delays the channel output by one time unit before providing it to the 
encoder. Without loss of generality, we will concern ourselves primarily with the binary-alphabet case, i.e. X = {0, 1}, as the 
extension to general finite X is straightforward. 

A. Finite-state Schemes 

A finite-state (FS) encoder/decoder scheme for an additive noise channel — depicted in Fig. [5] — consists of several 
components: 

1) An encoder state variable s[ e) and decoder state variable sf \ each taking values in a finite set S. 
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Fig. 5. A finite-state encoding/decoding scheme for a modulo-additive channel. 



2) A source pointer pi and a finite lookahead constant I. 

3) An iid common randomness source ~ pg taking values in a finite alphabet. 

4) An encoding function Xj, — e(sf\ M^ +e , yi, Oi) € X. 

5) A decoding length function Li = dL(sf\yi,0i) that also determines the update of the source pointer: Pi+\ = Pi + Li. 

6) A decoding function Mp i +L ^ 1 = d M {sf\y l ,O l ). 

7) State-update functions for both the encoder sf^ — f^(sf \ MP* +e , yi, Oi) and decoder sf^ = /(d)(s- d) , yi, 9i). 

At each time step, the encoding function determines the input Xi to the channel, the decoding function estimates the first 
Li source symbols that have yet to be estimated (based on the output yi of the channel), and state variables and the source 
pointer location are updated in anticipation of the next transmission. 

Observe, first, that this class of schemes is sufficiently general to include the following as special cases: 

1) The class of "iterated fixed-length" block schemes, as defined by Lomnitz and Feder @). These are simply block codes 
that ignore the feedback. The common randomness at encoder and decoder allows for randomly generated block codes 
as well. 

2) Schemes that transmit a variable number of source symbols over a fixed number of channel uses, before reseting their 
state variables and repeating the operation (defined more precisely in Sec. IVIlfAl as "Repeated Finite-Extent" schemes). 

3) Schemes that transmit a variable (or fixed) number of source symbols over a variable (but bounded) number of channel 
uses (also known as "rate-adaptive" schemes |01). 

Secondly, notice that without certain restrictions in the definition of class FS, the problem can become trivial: 

1) Suppose that the encoder is permitted to be infinite-state. The system designer may then allow the encoder state s+ 
to be the current time index i. This then allows the encoding function to be a function of i, which in turn permits the 
encoding function to be customized for a particular noise sequence z°°: e(i, M Pi ) = M Pi — z^ Sending this through the 
channel, Zi is canceled out. The decoder needs merely read the channel output to obtain the message at the maximum 
possible rate, log \X\. 

2) Suppose that the decoder is permitted to be infinite-state. One may reverse the above construction by having the encoder 
blindly send the message bits through the channel e(M Pi ) = M Pi and asking the decoder to cancel out the noise. 
Specifically, letting s' d) = i, the decoding function can be a function of the time i. This allows for a clever system 
designer to choose dM(i,yi) — Vi — &i, which guarantees that M Pi — M Pi and that Li = 1 for any i. 

3) Finally, suppose that the finite-lookahead requirement is nonexistant — that is, the encoding function can look at the entire 
untransmitted message stream Xi = e(sf\ M™, yi, Oi). As we will illustrate, this is identical to allowing the encoder an 
infinite number of states. If M°° is a Bernoulli(l/2) sequence, then with probability one there exists a one-to-one map 
between M°° and i. The encoder may therefore send e{M™) = M Pi + z Pi as the channel input at time i. The decoder, 
as before, simply reads the channel output, achieving the maximum rate log \ X\. 



B. Performance metrics 

Channel coding typically concerns itself with the tradeoff between rate of communication and the frequency of errors. In 
the individual sequence setting of interest to us, we define the instantaneous rate and bit-error rate of an FS scheme at time n 
as 

1 ™ 

Rn = — / J Li, 

n 

i=i 

and 

nR„ 
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We consider two interpretations of these quantities. 

Best-Case. An FS scheme best-case p-achieves rate/error (R, e) for a noise sequence z°° if with at least probability p there 
exists a sequence of points {m} e Z+ such that lirrii-^oo Rm > R an d lim^oo e ni < e. In other words, a performance monitor 
that observes the system at the "right" times will see it achieve (R, e) with probability at least p. If p is 1, we say that the 
scheme simply best-case achieves (R,e). 

Worst-Case. An FS scheme worst-case p-achieves rate/error (R, e) if with at least probability p both liminfn^oo R n > R 
and lim sup n _ s . 00 e n < e. In other words, a performance monitor observing the system at any set of sample times will see it 
achieve (R, e) with probability at least p. If p is one the scheme is said to worst-case achieve (R, e). 

Observe that the randomness in these definitions has two possible sources: the source sequence M°° and the common- 
information sequence 9°° used by the FS scheme. Sometimes the source M°° will be a fixed sequence, but this is always 
made clear from context. 

V. Notions of "compressibility" 

The results of this paper connect the operational notions of achievability to certain long-established individual sequence 
properties, first introduced by Lempel and Ziv [J2] in a source -coding context. In this section, these properties are defined and 
some useful relations are presented between them. 

First, we denote the kth order block-by-block empirical distribution 

[n/k\ 
L * J i=0 

If the empirical distribution is instead computed in a sliding-window manner, we denote 

-. n — k+l 

The argument [x n ] is occasionally omitted when the context is clear. 

The fcth order block-by-block empirical entropy is indicated by H k (x n ) = Hpk (X k ). The sliding-window fcth order empirical 
entropy is similarly written as H k w (x n ) = Hpk {X k ). 

As shown by Ziv and Lempel [2|, the finite-state compressibility of a sequence x°° may be written as 

p(x°°) = lim sup lim sup H k w (x n ). 

fc->oo n— >oc 

An analagous quantity may also be introduced: 

p{x°°) = liminf liminf H^(x n ). 

— k— >oo n— >oo 

Operationally, compressibility is the smallest limit supremum compression ratio achievable for a sequence (Theorem 3 in (2]). 
It is not difficult to show that, analogously, the second quantity is the smallest possible limit infimum compression ratio. 
Informed by this, we refer to the original compressibility quantity as the worst-case compressibility and the new limit infimum 
version as the best-case compressibility. 

The following lemma, proved in Appendix [A] demonstrates that both best-case and worst-case compressibilities may be 
computed using either block-by-block or sliding-window empirical entropies. 

Lemma 1: Let x°° be a finite-alphabet sequence. Then 

p(x°°) = lim liminf ]-H k (x n ) 

and 

p(x°°) = lim limsupiiJ fc (x"). 

fc->oc n->oo k 

The porosity of a noise sequence z°° 6 Z°° is defined in best-case 

a(z°°) = \og 2 \Z\-p{z™) 

and worst-case 

„(z°°) = log 2 \Z\-p(z™) 

varieties as well. Observe the sign changes: while a "good" compressibility is small, a "good" porosity is large. The remainder 
of this paper clarifies the operational significance of these quantities. 
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VI. Statement of results 
The results of this paper may be summarized as follows: 

1) A converse that upper-bounds the best-case achievable rate by an FS scheme. 

2) A converse that upper-bounds the worst-case achievable rate by an FS scheme. 

3) A sequence of universal FS schemes {J- m }^ =1 that simultaneously achieve the best-case and worst-case converse bounds 
for any noise sequence z°°. 

Formally, each of these three statements corresponds to a theorem: 

Theorem 2: Suppose an FS scheme best-case p-achieves (R, e). If p > 0, then 

R< h b {() + a{z co ). 

Theorem 3: Suppose an FS scheme worst-case p-achieves (R, e). If p > 0, then 

R<h b (e)+a(z°°). 

Theorem 4: There exists a sequence of schemes J 7 " 1 , which for an iid Bernoulli(l/2) source M°° and every noise sequence 
z°° best-case achieves 

and worst-case achieves 

a(z°°) -5 m (z°°) , e m /(*(z°°) -S m (z°°)), 

with probability one, where e m , 6_ m , and S m all go to zero. 

Theorems |2] and |3] are proven in Section IVTIl In Section IVIIII we introduce the schemes {J 7TO }^_ 1 and prove Theorem |4] 



VII. Proof of Converse 

A. Definitions and Lemmas 

In order to prove the converse theorems, a series of definitions and lemmas is first required. 
Lemma 5: (Selection Lemma) Suppose X°° is iid Bernoulli(l/2) and L is a random positive integer with arbitrary conditional 

,\x» 

Proof: 



distribution p L \ x °° with respect to X°°. Then H(X L ) > E [L], 



H(X L ) = ^p(X L = a^log 



1 



p(X L = x l ) 



> ^p{X L =x l )\o, 



p(X l = x l ) 



(b) . 

= E[L] 

where step (a) follows because if X L — x l then X 1 must necessarily equal x . Step (b) follows from the iid Bernoulli (1/2) 
distribution of 1°°. ■ 
Definition 1: Let {Li\°^ l be a bounded sequence of nonnegative integers, and let M°° and z°° as usual denote binary 
sequences. The fc-partition of (M°°,z°°) according to {Li} is the sequence of blocks 

In this context, {Li} are referred to as the partition lengths. 

Definition 2: Let x°° be a sequence of symbols drawn from a finite alphabet X. If there exists a series of sample points 
{nj^j such that the sequence p 1 (x)[x" i ] converges to a distribution p(x), p(x) is said to be a limiting distribution for x°°. 
Observe that for any finite-alphabet sequence x°° at least one limiting distribution exists: p 1 (x)[x n ) is an infinite sequence in 
a compact set, so at least one convergent subsequence must exist. 

Definition 3: Let z°° be a finite-alphabet sequence. The set A4k(z°°) consists of all binary sequences M°° such that there 
exist partition lengths {Li}, a resulting fc-partition {(M *,•*)»}> an d a limiting distribution p(L, M L , z ) for the sequence 
{Li, {M L \z k ) t } such that 

K[L] fi >H f {M L \z k ) + l. (1) 

We may interpret the set in the following manner. Suppose first that a "genie" partitions the source sequence M°° into 
an arbitrary series of variable-length blocks {{M Li )i}. Each block (M Li )i, of length Li, is then source-coded with side 
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information (z k )i at average rate less than Hp(M L \z k ) + 1. The set A4k(z°°) consists of all source sequences that, in a sense, 
allow such a genie/side-information-source-coding setup to compress strictly better than one bit per source symbol. One would 
expect that the occurrence of such a set is a rare event when the source is drawn iid Bernoulli (1/2). This is formalized with 
the following lemma. 

Lemma 6: Let z°° be a fixed finite-alphabet sequence, and let M°° be drawn from an iid Bernoulli (1/2) process. Then the 
probability that M°° e Mk(z°°) is zero for all k. 

Proof: See Appendix IE1 ■ 
We may easily expand this lemma to allow for common randomness. 

Corollary 7: Let z 00 be a fixed binary sequence, let M°° be drawn from an iid Bernoulli (1/2) process, and let 9°° be a 
finite-alphabet sequence of arbitrary distribution that is independent of M°°. Then the probability that M°° £ Aik((zi, flj)?^) 
is zero for any k. 

Proof: First observe that Lemma [6] does not require that z°° be a binary sequence — only that it be of a finite alphabet. 
As such, for a given sequence 9°° we may define the surrogate z-sequence z°° = (zi, 8i)°^ 1 . 

Applying Lemma [6] for this surrogate sequence, we have that for any fixed z°° and 9°°, the probability of drawing an 
element of JAk((zi, fr° m a Bernoulli (1/2) process is zero. Since 9°° is independent from M°°, the corollary follows. 



B. Converse Lemma 

Although the converse results are presented as two distinct theorems, at their heart is the same argument. We present this 
core result in the following lemma. 

Lemma 8: Suppose an s-state ^-lookahead FS scheme achieves (R, e) on points {ni} for a specific source sequence M°°, 
a specific channel noise sequence z°°, and a specific encoder/decoder common information sequence 8°°. If for some k G Z + 
M°° is not a member of M k ((zi, and if H k (z n ) + H k (9 n ) - H k ((zi, 0j)"=i) -^n^oo 0, then 

R < 21 °g s + ^ + 2 + hb{() + ! _ limsup 

^ i — s-oo 

The general idea in proving this lemma is to turn any given FS scheme into a source encoding/decoding scheme. Consider 
an FS decoder that achieves (R, e) on some points {rii}, and ignore the minor complication of common randomness 9°°. Given 
only the channel output the decoder produces an estimate of the source sequence M°°. Knowing the source sequence 
and the channel output, the decoder is technically capable of "simulating" the encoder and thereby obtaining both the channel 
input sequence x°° and the noise sequence z°°. One may therefore interpret the channel output y°° as an encoding of the joint 
source sequence (M°° , z°°). The following proof utilizes a rigorous argument inspired by this intuition. 

Proof: Let e°° denote the error indication sequence e$ = 1 M . First, consider the fc-partition of (M°°, z°°, 9°°, e°°) 
according to the given FS scheme: 

/ \ oo 

( M Li p Li 7 k f) k )°° — | ]\JPik+i-l pPik+i-1 7 ik nik ] 

(M ,e ,Z ) i= l - ^%J 1)k+1 ,S (j Ii)Hi' 2; (i-l)W' ( '(i-l)Wj j=1 ' 

In other words, let (M Li 7 e Li , z k ,9 k )i enumerate fc-blocks of channel noise and common information, along with the source 
bits that are estimated during each such block and the error indicators for these source bits. Let Li = Pik+i — Pu-i)h+x denote 
the partition lengths. 

Now define the sequence of points {n*} C {n^} so that 



lim ^H k (z<) = limsup -H k (z ni ), 

% T OO f\i i. QQ f\i 



(2) 



and let p(M L , e L , z k ,9 k ) be a limiting distribution of (M L ,z k ,9 k )i on these points {n*}. Recall from Def. |2] that such a 
limiting distribution always exists. 

Suppose random variables (M L , e L , z k , 9 k ) are distributed according to p(M L ,z k ,9 k ). We first use the FS scheme given 
in the lemma statement to construct a lossless source encoder/decoder for (M L ,z k ), with 9 k as side information. By later 
requiring that the rate of this encoding exceed H p (M L , z k \9 k ), the lemma may be proven. 

El Let j E Z + . We construct a codeword for the source block (M L \z k ) 3 given side information block (9 k )j as follows: 

1) To reduce clutter, we remove some of the unnecessary indices. Denote the source bits used by the FS encoder 
during this jth block as M l = M^Z^l ■ Similarly, let the source bits estimated by the FS decoder be 
referred to as M L = Mp^~^ and the error indicators as e L = e^^l^- Additionally, let s (e) = s ( ^_ 1 j fc+1 

and s (d) = s ( , d) , , indicate the initial encoder and decoder states, and let x k — x] k M and y k — iA. . . 
denote the channel inputs and outputs during the block. 

2) Apply a binary Huffman code to e L to create the compressed representation g(e L ) £ {0, 1}*. The Huffman 
code is designed according to the limiting empirical distribution p(e L ). 



3) Add the codeword (s (e) , s (d) , Mf +f, g(e L )) to the codebook. 

4) Observe that (s (e) , s (d) , M^jtf, g(e L )) decodes uniquely into (M L ,z k )j given side information block 

• Simulate the channel decoding operation with initial state s (d) , common information 9 k , and channel output 
y k . This yields M L . Correcting for errors with the correctional information embedded in g(e L ), we have 
M L . 

« Simulate the channel encoding operation using initial state s e , feedback y k , common information 9 k , source 
M L from the previous step, and M^tj from the codeword. This yields the channel input x k , which produces 
z k when modulo-2 added to y k . 
Refer to this decoding operation on a codeword c as C _1 (c, 9 k ). 
E2 Build the codebook by repeating step El for every block (M L ,z k ,8k)j, j G Z + . Note that each codeword is of 

length 2 logs + £ + fc + length(g(e L )) and losslessly decodes into its source block. Call this codebook C. 
E3 Define the codebook encoding function F of a source sample (M L , z \9 ) as mapping to the shortest codeword in 
the set {ceC: C- 1 (c,9 k ) = (M L ,z k )}. 

We now establish the expected length of this code when applied to the probabilistic source (M L , z k \9 k ). As mentioned in 
step E2, a given codeword is of length 2 logs + I + k + length(<?(e L )), where e L is the error sequence. According to the 
assumptions of the lemma, the bit error rate on points {n*} is upper-bounded by e. From this and from p(e L ) being a limiting 
distribution on {n*}, the expected frequency of 1 in e L may be upper-bounded by e. Therefore, the expected length of g(e L ) 
is upper-bounded by khb(e) + 1, and 

E [length(F(M L ,z k \9 k ))] p < 2 logs + 1 + k(l + h b {e)) + 1. 

Since F is a lossless variable-length encoder for sources drawn from p(M L , z k \8 k ), the expected codeword length must 
exceed the conditional entropy according to p: 

H p {M L ,z k \6 k ) 
H p (z k \e k )+H p (M L \z k ,e k ) 

H p (z k ) + H p (M L \z k ,e k ) 
H p (z k )+E[L] p -l 
limsupH k (z rH ) + Rk-l 

where (a) holds because of the final assumption in the theorem statement, (b) follows from Corollary and (c) is due to both 
(f2]l and the lemma's assumption about rate R being achieved on points {n*}. Rearranging terms proves the lemma. ■ 



21ogs + ^+fc(l + /ie,(e)) + l > 

> 

(a) 



(6) 
> 



C. Proof of Converses 

Armed with Lemma [8] it is a relatively straightforward matter to prove Theorems |2] and [3] 

Proof of Theorem^ We first note that because (9°° is drawn iid and z°° is fixed, H k (z n ) + H k (9 n ) - H k ((zi, 6>*)™ =1 ) 
with probability one for every fc. Furthermore, by Corollary |7] M°° ^ Mk((zi, with probability one for every k. 

Therefore, if (R, e) is best-case-achieved with positive probability, it must then be achieved for some specific (M°°, 9°°) such 
that M°° £ MkHz,, 6» 4 )~ 1 and H k (z n ) + H k {6 n ) - H k ((z l , -> for every fc. Let {n % } be the subsequence on which 

it is achieved. 

Applying Lemma [8] 

2 log s + I + Vk , , , 1 , „ . 

R< ^-j — + h b {e) + 1 - -limsup H k (z n *), 



for any fc. 

Taking the limit supremum as fc — > oo, 



R < h b (e) + 1 - lira inf \ lim sup H k (z n - ) 

< /i f ,(e) + l-liminfiliminf J ff fe (z™ 1 ) 

< /i 6 (e) + l-liminf y lim mi H k (z n ) 
= h b (e) + l- p(z°°). 
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Fig. 6. A finite-state active feedback scheme for a modulo-additive channel. 



Proof of Theorem \3\ If (R, e) is worst-case-achieved with positive probability p, then it must be worst-case achieved for 
some (Af 00 ,^ 00 ) such that M°° ^ ■M.k((zi,9i)°^ 1 (because by Corollary [7] this occurs with probability one) and H k (z n ) + 
H k (9 n ) — H k ((zi,9i)™ =1 ) —> (because this occurs with probability one when 9°° is chosen iid). 

We may therefore apply Lemma [8] with {m} = Z + . For any k, we have that 

2 log s + £ + Vk , , . 1 , - fc . „ , 

R< i, +fe b (e) + 1 - -limsupg fc (z"). 

Since this holds for arbitrary k, we may take the limit infimum of the expression with k — > oo: 

R < liminf ( 2l °SS + £ + Vk + ^ Qk^ n) \ 

k^oc \ k K n-^OO J 

= hb(e) + 1 — lim sup ^ lim sup H k (z n ) 
= / lb (e) + l-p( 2 °°). 



VIII. Proof of Achievability 

In this section, a sequence of FS schemes is constructed and guaranteed to achieve both the best-case and worst-case bounds 
for any channel noise sequence z°°. This "universal achievability" is analogous to the universal source coding achievability 
scheme introduced by Ziv and Lempel J2). 



A. Some Classes of Schemes 

To begin, several additional classes of schemes are introduced, and their relationships with each other and with class FS are 
clarified. This will prove useful in constructing the universal achievability schemes. 

A finite-state active feedback (FSAF) scheme is a variation of the class FS that allows for active feedback (Pig. |5j. It consists 
of the following: 

1) An encoder state variable sf\ a decoder state variable sf\ and a feedback state variable s®, all taking values in a finite 
set. 

2) A source pointer pi and a finite lookahead constant I, 

3) An iid common randomness source 9i ~ pe- 

4) A finite-state feedback channel whose output at time i is distributed according to C/j ~ p u \y,s( u i\yi> S< P)- 

5) An encoding function x,: = e(sf , M^ +e , 9i, m). 

6) A decoding length function Li = c?i(s' d) , yi, 9i, ui) that also determines the update of the source pointer: Pi+\ = pi + Li. 

7) A decoding function Mg^- 1 = d M (sf ,y t ,9 l ,u l ). 

8) State-update functions for the encoder sf^ — f( e )(sf\ M^ +e ,9 i} Ui), decoder sf^ = f(d)(si ,yi,9i,Ui), and feedback 
channel sf +1 = / (f) (sf ,y t ,9 l ). 

Lemma 9: The class of schemes FSAF is equivalent to class FS. 

Proof: By setting Ui — yi, we find that FS is a special case of FSAF. To show the other direction, first assume we are 
given an FSAF scheme. By the arguments that follow, we will construct an FS scheme that simulates it. Quantities relating to 
the constructed FS scheme will be notated with a "hat," e.g. 9, §1, etc. 

First, let U be a random vector with components indexed by y £ X and s G 5®, and let the (y, s)th component be distributed 
according to U V:S ~ Pu\y,s, where Pu\y,s is the given FSAF scheme's feedback channel transition matrix. Furthermore, let the 
components be independently distributed. We then define the FS scheme's common randomness as 9i = (#j,Uj) distributed 
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Fig. 7. A finite-extent scheme. 



lit 



rid according to pePu, where pg is the common randomness distribution for the given FSAF scheme. Let {U V:S )i denote the 
(y,s)— th component of the z-th random vector U,-. 

The equivalent FS scheme is assigned encoder and decoder state variables sf* = (sf ,sf ) and s^f = (sf ,s\ ), where 
8+, sf\ and 8^ are the state variables for the FSAF scheme. The encoder/decoder update functions are given by 

/(e)(^ e \ Mg+t, y u § t ) = (f (e) (s\ e) , 0,, (U yijsf )i), Msf, y u 6,)) , 

and 

Observe how the randomness of the FSAF scheme's active feedback channel is simulated by means of the common randomness 

6i = (0j,U,-). 

In this manner the FSAF encoder/decoder/feedback state machines are simulated by the FS encoder/decoder state machines. 
The FSAF encoding and decoding functions may be implemented in a similar manner: 

and 

dM(sf\yi, Si) = d M (sf\yi, 6i, {U y . }S m)i). 

Since this constructed FS scheme is identical to the given FSAF scheme, FSAF is a special case of FS. ■ 
A finite-extent (FEex) scheme T for a channel with alphabet X — as depicted in Fig. [7] consists of: 

1) A extent n. 

2) A feedback channel with transition probabilities given by Ui ~ p U( y l ) and taking values in X, for i E 
{!,...,»}• 

3) A common randomness variable 8 drawn from a finite alphabet, independent of the source, and provided to both encoder 
and decoder. 

4) Encoding functions x x = ei(M°°, 6), x 2 = e 2 (M°°, 6, u x ), . . . , x n = e„(M°°, 9, u n ~ r ). 

5) A decoding length function L = d,L(y n , 8, upper bounded by nlog \X\. 

6) A decoding function M L = d,M{y n ,8, 

A repetition scheme is constructed from an n-extent FE scheme T . Let ^(M 00 , z n ) describe the application of scheme F 
to source M°° and noise block z n . Then the repetition scheme T consists of repeated independent uses of T, i.e. 

7(M°°, z°°) = {F ((Mf, 0°°), z?) ,T ((M^ti, 0°°), . • ■ •} • 

In each block, T is applied to a "virtual source" consisting of the first n bits of the source that have yet to be transmitted and 
a string of 0s. 

Proposition 10: The class of repetition schemes is a subclass of FSAF schemes (and therefore of FS schemes). 
This follows directly from two properties of repetition schemes: 

• The block-based structure allows for implementation with finite-state machines. 

• A repetition scheme constructed from an n-extent FE scheme has finite lookahead constant n. 
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B. Universal Scheme Construction 

At the core of the achievability scheme is a lemma, introduced and proven by Shayevitz and Feder J5]: 

Lemma 11: (Shayevitz and Feder, 2009) Let A" be a finite alphabet with an addition operation. Then there exists a sequence of 

n-extent FE schemes ^(X) with the following worst-case performance guarantees over all additive noise sequences z°° 6 X°° 

and source sequences M°° £ {0, 1}°° 

sup P (m l ^ M L ) < e{n) (3) 



and 



> 1 - H\z n ) - e(n) \ > 1 - e(n), 



inf P [ - > I - H\z n ) - e(n) > 1 - e(n), (4) 

z™£A'",M co 6{0,l} c " 



where e(n) — > 0. 

Note that the only randomness in the above probabilistic statements is due to the randomness in the feedback channel. 

Observe that Lemma [TTI concerns itself with only the first-order empirical entropy H 1 (z n ). By specializing to binary 
sequences, this may be replaced by higher-order empirical entropies. 

Corollary 12: For binary additive noise channels with feedback, there exists a sequence of finite extent schemes T m with 
extents N(m) — > oo and the following performance guarantees: 

sup P (m l ? M L ) < e m 

z*T(m) e ;t J vCm),M°c>e{0, :L }°c> V / 



and 



inf P^TT^T > l-~H m {z N ^)-e m ) > 1 

z «( m ) g ^jv( m ) iM oo g {- 01 }=o \N(m) m 



where e m — > 0. 

Proof: For a given m, consider the m-tuple supersymbol channel characterized by inputs Xi = x' l ™_ 1 ^ m+v noise Zi = 
z (i-i) m +v and outputs Y t = (x (i _ 1)m+1 +z {l _ l)m+l , x {l _ l)m+2 + z {l _ l)m+2 , . . . , x lm +z tm ). Applying Lemma[Il]to channels 
of this alphabet yields a sequence of schemes .F„({0, l} m ) with 

£n,m -> 0. (5) 
n— >oo 

Observe that ^({O, l} m ) may be seen as a finite-extent scheme both for the supersymbol alphabet {0, l} m additive noise 
channel as well as for the (fundamental) binary alphabet {0, 1} additive noise channel. 

By © we may choose N(m) so that e^( m ) m — > 0. Denoting T ra = J r Ar( m )({0, l}™ 1 ), this proves the lemma. ■ 

m— >oo 

The sequence of finite-extent schemes {J- m }^=i f° rm the basis of the universal achievability construction. 
Definition 4: The universal achievability scheme of order m is the repetition scheme T m formed from the A^(m)-extent 
scheme T m . 

We end with an important lemma regarding repetition schemes. 

Lemma 13: Let T be an A^-extent FE scheme and let T be the corresponding repetition scheme. Define Ei be the error 
indicator for the ith block and define Ti = li^a; so as to indicate if in the ith block the number of bits transmitted exceeds 
a fixed threshold aj. Then the Markov relations 

- Mg - E 1 - 1 (6) 

and 

Ti - M$ - T 1 - 1 (7) 



both hold, where MVX = M denotes the N source samples used in the ith block by the repetition scheme. 

Proof: See Appendix [C] 



C. Proving Achievability (Theorem^} 

Two additional lemmas, regarding the limiting behavior of random binary sequences, are required in order to prove that T m 
achieves the performance promised in Theorem |4] 

Lemma 14: Suppose {Xi} is a sequence of iid Bernoulli(p) random variables, and suppose {a{\ is a bounded sequence of 
real numbers. Then with probability one, 
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and 



lim inf — XiCti = p lim inf — on 

~~ — ' n— foo fi — ' 

i=l i=l 



Proof: Let Yi = ctiXi. Since at is bounded, there exists constant A such that \on\ < A. We then have that E \Y?] < A 2 is 
bounded and X)fc=i p" var 3^ < A 2 X)fc=i w < 00 • These two statements qualify the use of Kolmogorov's strong law, which 
states that i X)"=i ^ — n S"=i ~ ^ w i m probability one. This proves the lemma. ■ 

Lemma 15: Let {a^} be a bounded real-valued sequence, and let {Xi} be a random binary process. If P (Xi = = a; 4-1 ) < 

p for any £ {0, then with probability one 

1 " 

lim sup — aiXj < pa (8) 

n— >oo ^ . , 
i— 1 

1 ™ 

lim inf — a,Xj < pa (9) 

n— foo 77, ^ — ' 

where a = limsup^^ Yl7=i a « an d — ~ lim infn-^co SiLi a »- Similarly, if P = = a; 1-1 ) > p for any € 

{0,l}*-\ 

1 " 

lim sup — ^""^ a^Xj > pa (10) 

n=>oo ^ . i 
2=1 

1 " 

lim inf — N onX^ > pa (11) 

n— foo 77, — ' 

i=l 

Proof: Let {Y^} be a Bernoulli-p iid process. We will construct correlated binary processes {Xi,Yi} whose marginal 
distributions are identical to those of {Xi} and {Yi}, and use these to prove the lemma. 
• Let {UA- be a sequence of independent uniform [0, 1] random variables. 
> For each i let 

f 1 if Ui <P (x t = llX 4 " 1 = X*-^ 



X, 
and 



otherwise 



Yi = 



1 if Ui < p 
otherwise 



By Lemma [14] limsup,,^^ \ YT%=\ a i Y i = P a and liminf T ,_ Hx , \ YT l= \ a J Y i = PQ- 
Consider the case where P {Xi = l\X l ~ 1 = a; 1-1 ) < p. Then since Xi < Yi we have that 

1 - 

lim sup — > oiiXi < pa 



2=1 



and 

1 - 

lim inf — N a^Xj = pa 

n— >oo 77, ^ — ' 

i=l 

with probability one. Since {Xi} has the same marginal distribution as {Xi}, this proves (JHJ and 
• Similarly, in the case where P (X 4 = = x 1 ^ 1 ) > p, the relation Xi > Yi always holds and we have (IToT > and ( fTTT l. 

■ 

Proof of Theorem [5} Suppose J 7 " 1 is applied to a source M°° and noise sequence z°° . Let {Li} be the number of 

source bits decoded in each block. Consider the ith block — i.e. T ( (M^? = ? t+ ^ \ 0°°), z z , N ^lj, , . , | . Let M/ V ^ m - ) 

V Hj=i L i+! (»-x)JV(m)+i y 



M,T£iV T indicate the source bits used by the encoder, let M f L ; = M -_l ' indicate the estimate produced, 

l^j = l L i + 1 ' W 2^j = l + 1 

let i?i = jy7^ indicate the rate of the block, and let z^ m ^ — z^f_} 7 1 ^ N ^ m ^ +1 denote the noise of the block. Finally, let 

N(m) _ iN(m) 



U {i) ~~ U (i-l)N(m) + 



, denote the feedback during the block. 



Rate Guarantees 



13 



Recalling that e m denotes the rate-error guarantee in Corollary Q~2] define the "rate indicator" Tj as the indication of whether 
the rate in the zth block exceeds the threshold 1 — ^H m {{zfS m ')) — e m , i.e. 



*(9 



(12) 



One may demonstrate that for any i and any t 1 - 1 £ {0, l}* -1 , P (Tj|T i_1 = t 1 - 1 ) > 1 - e r , 



(a) 



E 

m N(m) e {04}N( m ) 

p(AfJ (m) =m Ar ( m )|T 4 - 1 =t i ~ 1 

E 



'(^ = 1| 



W 



1 = e- 1 ) 



(b) 

> 1 - e r , 



(13) 



where step (a) follows from the Markov relation (Lemma Qj} and step (b) is due to the rate guarantee in Corollary [LP 
Applying Lemma [T5l to {T!;}, ( TTOb bounds the limit supremum rate of the scheme. 



1 * («) 1 * 

lim sup — > i?i > lim sup — > i^T, 



(a) 




> 


lim 




fc- 


(6) 




> 


lim 




k- 


(c) 




> 


(1- 


(d) 


(1- 


> 


(<0 




> 


(1- 



1=1 
fc 



A:— >-oo 



> (l-e m )limsup(l-H m (z feAr ^)-e m ) w.p.l 



> (i-g(i-yo-e(0)w.p.i, 

where step (a) is due to Tj < 1, step (b) follows from the definition of Ti, step (c) is an application of dTOb from Lemma [ 
step (d) comes from the concavity n of the entropy function, and step (e) involves the definition 



(14) 



S m (z°°) = e m + liminf -H m ( z kN ^) - p(z°°). 



fe^oo m 

Since lim inf /.^oo ff m ( z kN ( m )) = liminffc_>. 00 H m (z k ) and by Lemma [Tl/c^z 00 ) = lim m ^oo liminffc^oo —H m (z k ), we have 
that 5_ m (z°°) vanishes with increasing m. 

A similar line of logic can demonstrate the limit infimum rate bound: 

(a) 



1A W . . 1A 
Una inf — > Ri > lim inf — > RiTi 

fc— h — ' b^,^-, h — ' 



i=l 

k 



where in the final step we define 



i=l 

fc 

> (1 - gitamf r J] (l - fl* 1 ^) - e™) w.p.l 

00 i=i 

> (l-e m ) liminf (l - H m (z kN ^) - e m ) w.p.l 

> (1 - e m ) (1 - 75(2°°) - 5 m (z°°)) w.p.l, 



= c m + limsupir"(z feA, ( m )) -p(z°°). 



(15) 



As in the infimum case, since limsup^^ fj m ( z kN ( m )'j — limsupj.^^ H m (z k ) and, by Lemma [T] 
~p(z°°) — lim m ^oo lim supj.^^ —H m (z k ), we have that 8 m (z°°) vanishes as m — > oo. 



14 



Error Guarantees Let Ei indicate the presence of an error in the ith block, i.e. Ei — 1— 1 4 The limit-supremum 



bit-error rate may be written in terms of {Ei} and {Li} as 



lim sup = limsup-^^ 1 ^ 



< limsup^rc ^ 

liminf„^oo i X«=i 

liminfn^oo \ Yh=i Li 
(b) N(m) lim sup„^ 1 5"™ i 

< * n ^°° "_ z -'- 1 1 w.p.l, (16) 

" iV( m )(l-p( 2 »)-5 ro ( 2 °°)) 

where (a) holds because < N(m) for an 7V(m)-horizon repetition scheme, (b) follows with probability one from ( fT5l >. 

As was done with {T,}, one may demonstrate that for any i and e i_1 € {0, 1} 1 ~\ the bound P {Ei = = e ,_1 ) < e m 

holds: 



P^^ll^-^e*- 1 ) = J2 

m N(m)6{0,l}^(™) 



P^M 1 ^'" 1 ' = /,/ AI "" £' ' = ,' 



[P (Ei = l|Mj) (m) = mj P (M^ {m) = m N ^\E 1 - 1 = e 1 " 1 



(o) 

(fc) 

< e m , (17) 

where step (a) follows from the Markov relation (Lemma [Tjj and step (b) is due to the error bound in Corollary Q~2] This 
allows for the application of Lemma [151 to {Ei} with constant weights 04 = 1, establishing that 



1 " 

lim sup — > Ei < e r , 



(18) 



i=l 



This in turn may be inserted into ( [TBI , proving 



v ^7=i EiLi e m 

lim sup '„ — - — < = w.p.l. 

rw°c Etl L i " 1 - -P{Z°°) - 5 m (z°°) 



Therefore scheme T m worst-case (and best-case) achieves bit-error rate e 



in • 



IX. Predictability and a Simpler Sub-Optimal Scheme 

The finite-state predictability was introduced by Feder et al. in [13] as an analog of compressibility in the context of 
universal prediction, just as porosity is an analog in the context of modulo-additive channels. We explore the relationship 
between porosity and predictability, but we do so with fairly pragmatic motivations. 



A. Practicality of {J 7 " 1 } 

While the achievability schemes {J 7 " 1 } manage to asymptotically achieve porosity for any sequence, they are not particularly 
simple to implement. The complexity of T m is hidden within the Shayevitz-Feder empirical-capacity-achieving scheme at its 
core (Corollary [L2V At each time instant, the Shayevitz-Feder decoder is required to compute the posterior of the message 
given all the channel outputs in the block so far. This computation is linear in the alphabet size, but because T m applies 
Shayevitz-Feder to binary m-tuples (Corollary [LZb . it is exponential in m. 

Although it grows in complexity quite rapidly, this Horstein-based approach of Shayevitz and Feder is actually quite efficient 
for small alphabets, e.g. binary. The only reason it is applied to m-tuples in our construction is to account for memory and 
correlation within the noise sequence. Alternatively stated, we seek to achieve the rath-order empirical capacity for arbitrarily 
large m. The simpler repetition schemes suggested in this section take a layered approach, wherein memory and correlation 
is first "removed" from the noise sequence, after which the binary-alphabet Shayevitz-Feder scheme is used to communicate 
with the decoder. 
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Fig. 8. Block diagram for the scheme Q n . 
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Fig. 9. An alternative block diagram for scheme Q n , where the prediction loop is represented as a surrogate noise sequence z 



B. Construction of layered scheme 

In defining the more practical repetition schemes Q n , we first describe the finite-extent schemes {Gn} at their heart. As shown 
in Fig. [8] in its inner layer Q n consists of a predictor that forms an estimate z% for the noise Zj at each time i g {1, . . . , n}. By 
subtracting this prediction from the encoder's output, a "surrogate" channel is created with "effective" noise z n = (zi — Zi)f =1 . 
In other words, the noise z n is replaced by a sequence of error indications for the predictor (Fig.[9]i. The first-order finite-extent 
scheme ^({0, 1}) (defined in Lemma fTTT). which we will refer to simply as F n , is then applied to this surrogate channel. 
Roughly speaking, the prediction step exploits the memory and correlation in z n in order to reduce its first-order empirical 
entropy, which then serves to boost the performance of JF n . 

To aid in formally constructing {Gn}, define the following: 

• i e f} re f ers to the encoding functions for F n . 

• df and refer to the decoding-length and decoding-message functions of T n . 

• {Pi(w,|u 1-1 , y 1 )} is the set of feedback conditional distributions for F n , and 9jr is the common randomness. 

• Zi — fiz 1 ^ 1 ) is the estimation function implemented by the prediction scheme (which has yet to be described). 

The prediction-based FE scheme Q n can then be defined in terms of the six components listed in the definition of FE schemes: 

1) The extent is n. 

2) The feedback channel is a simple delay-one noiseless feedback: Ui = y%-\. 

3) The common randomness is given is a X 2 -valued vector indexed by 

The (y\ u' i_1 )-th component U } u i-i is distributed according to Pi{-\y l , Note that is introduced 

only to simulate the feedback channel of scheme T n at both encoder and decoder. This is similar to the technique used 
in the proof of Lemma [9] 

T (i) 

4) To define the zth encoding function, first let uf = U y T be the simulated feedback channel. Note that both encoder 
and decoder may compute uf at the end of the ith time step. The encoding functions are then given by 

e^M-tW" 1 ) = ef (M-fr^- 1 '^) - ftf' 1 - x 1 ' 1 ). 

5) The decoding length function is unchanged: L = df(y n ,9,u n ^ 1 '- F ). 

6) The decoding function is also unchanged: M L — d^ I (y n ,9,u n ~ 1,:F ). 
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The prediction scheme at the heart of Q n is the incremental parsing (IP) algorithm introduced by Feder et al. IfPJl . This is 
an elegant and simple algorithm that is based on the same parsing procedure as Lempel and Ziv's compression scheme. Rather 
than describe its operation in detail, we point the reader towards the exposition in Sec. V of |[T3l . 

C. Analysis of performance 

The schemes {Gn} have been constructed to simplify the encoding and decoding process. Here, this notion is quantified. 

As a layered scheme, Q n consists of two machines running in parallel: the IP predictor of Feder et al. lTT3l and the 
actual communication scheme T n of Shayevitz and Feder 0. The operations-per-time-step required by F n do not scale 
appreciably with n, so the bottleneck is the prediction operation. Observe that the complexity bottleneck for T rn also arises 
from accounting for the noise sequence's memory and correlation. As previously mentioned, accounting for memory with T m 
requires an exponential number of operations-per-time-step. The IP predictor, on the other hand, requires only a linear number 
of operations in order to produce an estimate. Specifically, at each time step the Lempel-Ziv parsing tree must be extended. 

The rates achieved by schemes Q n however do not quite reach porosity. To illustrate this, we start by repeating the definition 
of predictability as given in lf]~3l . 

Definition 5: The finite-state predictability of a sequence x°° is the minimum limit-supremum fraction of errors that a finite- 
state predictor can attain when operating on x°°. Just as with compressibility, one may define a limit infimum version of this 
quantity. We term the former the worst-case predictability W and the latter the best-case predictability tt. 

In Theorem 4 of lfl3l it is shown that the IP predictor achieves the worst-case predictability W(x°°) of any sequence x°° . 
Though it is not stated in the theorem, the proof that is given also demonstrates that the IP predictor achieves the best-case 
predictability t£.{x°°). Therefore the limit supremum (or infimum) first-order empirical entropy of the surrogate noise sequence 
approaches hb(W(x°°)) (or hb(w(x°°))). By applying the FS schemes T n to this noise sequence, the performance approaches 
rate 1 — hb(jr(x°°)) (or 1 — hf,(TT_(x co ))) with vanishing error. 

In Sec. VI of fl3l . the worst-case predictability w(x°°) is bounded in terms of the compressibility: 

hi\p(x°°)) <Tf(x°°) <±p(x°°). 

An identical set of bounds exist between the best-case predictability and best-case compressibility. When a noise sequence 
satisfies the lower bound with equality, one may observe that the asymptotic performance of {Gn} matches that of {J 7711 }. 
However, this is usually not the case, and one must settle for the guarantee of worst-case rate 1 — hb(fi(z°°)/2) and best-case 
rate 1 — h\ 3 (p(z°°) / 2) . Each falls strictly below worst- and best-case porosity unless the noise sequence is either completely 
redundant or incompressible. 

X. Summary 

In this work, the best-case/worst-case porosity ct(-)/ct(-) of a binary noise sequence is defined as one minus the best- 
case/worst-case compressibility 1 — p(-)/l — /)(•)■ Porosity may be seen as an individual sequence property, analogous to 
compressibility or predictability, that identifies the ease of communication through a modulo-additive noise sequence. Two 
results regarding porosity are at the core of this work. First, porosity is identified as the maximum achievable rate within 
the class of finite-state communication schemes. Second, it is shown that porosity may be universally achieved within this 
class. Together, these results parallel those of Lempel and Ziv in the source coding context [2|. Furthering this analogy, the 
achievability schemes given here complement those of Lomnitz and Feder in similar manner as the infinite-state and 
finite-state schemes of 0. 

In addition to the above, a more practical universal communication architecture is introduced, built upon prediction. 
Rather than communicate using blocks of channel uses — which contributes to an exponentially growing complexity — 
a layered approach is taken. A prediction algorithm first "removes" the memory from the noise, and then a simple first-order 
communication scheme is employed. While the resulting algorithm is suboptimal, it reduces complexity considerably, and also 
draws an operational connection between predictability and porosity. 

Appendix A 
Proof of LemmaQ] 

We show that the distinction between block-by-block and sliding-window empirical entropy computations vanishes in the 
limit of large blocks and long blocklengths. First, a few definitions that simplify notation: 

Definition 6: A A: -block code C maps fc-tuples from an alphabet X k into binary strings of arbitrary but finite length. 
Definition 7: The (8, k)-extension code for a fc-block code C is a A: -block code Cg whose encoding of a block X k is given 

by 

c e (x k ) = (xf,c{x 9 +h,...,c{x l ^: ik+0 ),x} %9[ V 
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The k-extension code is a fc-block code whose encoding is given by 



C{X k ) = ^ argmin I (c e (X k )\ , 

C e (X % ),6e{0,...,k-l} 

where £(■) returns the length of a binary string. 

In both the fc-extension and the (9, fc)-extension, the initial segment of Xf is referred to as the uncoded prefix, while the 
punctuating segment 

We start by demonstrating that the block-by-block empirical entropy limits exist. 
Lemma 16: Let x°° be a finite-alphabet sequence. Then the limits 

1 



is the uncoded suffix. The encoded segments in the middle are called the encoded subblocks. 



and 



lim limsup ^H k (x n ) 



lim liminf yH k (x n ) 

fc— ¥oo n— >oo fc 



both exist. 

Proof: Let fc > fc, and let C n ,k denote the k-block Huffman code for the block-by-block empirical distribution p(X k )[x n ] 
Observe that by optimality of the Huffman code, 



H k (x n )<E[£(C n , k (X k ))] 



= E[e(C n , k (x™l k 1 ))]<H k (x n ) + l 



(19) 



where N is distributed uniformly over the set {0, . . . , |_§ J — 1}- Let C r and C n r be the (9, fc)- and fc-extensions of C n ^ k . 

By expressing the expected length of C ? in terms of the expected length of C„,fc, we can show that both limits in the 
lemma statement exist. 

Allowing M to be uniformly distributed over the set {0, . . . , |£ J — 1}, we have that 



H k (x n ) < E 

ffl E 

(c) 

< E 



(d) 

< E 



e(c ~Jx kM+k )) 

min tic zJx kM+k )) 

9e{0,...,*-l} V "> fe ' eV kM+l'J 



— Mk mod fcV /cTVf+l • 



LfJ-l 

+ ^ (^n.kC^fem+l)) 1 fcm+l>fcM+l-'-fcm+fc<fcM+fc 

2fc + P{km+l>kM + l,km + k<kM + ky(C, hk (x k k ^l k )) 

m=0 

LtJ-i fe 

m=0 « 



(/) 



2fc 



(a) 

< 2fc 



fc J n — fc 
n i fc 



fc J n — fc 



fFO") + 1 



Step (a) follows by Shannon's source coding converse, (b) is from the definition of C n r. (c): Observe that by setting 9 = 



-Mk mod fc, every encoded subblock within C r g is aligned so that it will be of the form C ntk {x k ™^ ') for some integer 



to. Step (d) involves first upper-bounding the length of the unencoded prefix and suffix components of C n< k{x^M+i ) at ^ 
bits each, and then summing the lengths of each of the encoded subblocks. Note that an encoded subblock Cn, k (x k ™Vl) 

only appears in C n £ _ Mk - mod kO^^"*) if x fc™+i * s ^ u ^y contained within the fc-block being encoded The indicator 

functions ensure that only these encoded subblocks contribute to the length summation, (e) follows from recognizing that 
M takes [n/kj values uniformly, and that at most fc/fc of these positions satisfy the conditions mk + 1 > Mk + 1 and 
mk + k < Mk + fc. (f) replaces the summation with an expectation, where the random variable N is uniformly distributed 
over the set {0, . . . , |_f J - 1}. (g) invokes (fT9l . 
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Dividing both sides of this resulting inequality by k and taking the limit supremum with respect to n, we have that 

1 ~r 2k I 1 * 

limsup -H k (x n ) < — + - + limsup -H k (x n ). 

n— >oo k k ™ n— foo ™ 

Taking the limit supremum of both sides of this expression with k, we have that 

limsup limsup -H k {x n ) < j + limsup jH k {x n ). 



Finally, taking the limit infimum with respect to k, 

lim sup lim sup —H k (x n ) < lim inf lim sup —H k (x n ). 

fc—^oo n— >oo k fc— >oo n— >oo K 

This proves that lim^oo limsup^^^ jH k {x n ) exists. 

Repeating this last set of arguments with the limit infimum with respect to n proves that lim^oo lim inf n ^oo \H k (x n ) 
exists. ■ 

Next, we prove that the sliding-window compressibility can be no greater than the block-by-block compressibility. 

Lemma 17: Let x°° be a finite-alphabet sequence. Then the following two statements hold: 

p{x°°) < lim lim inf \-H k (x n ) 

— k— >oo n— >oo k 



and 



p{x°°) < lim limsup -H k {x" 

k— >oo n~yoo 



Proof: As the form of this proof is very similar to that of Lemma [16] exposition will be limited. 
Let k > k, let C n: k denote the k-block Huffman code for the block-by-block empirical distribution p{X )[x n ], and let 
C r g and C n r be the (9,k)- and ^-extensions of C n ,k- 

Allowing M to be uniformly distributed over the set {0, . . . , n — k}, we have that 



= E 
< E 



< 



1 {pn,k( X M+l) 



< E 



mm 

)£{0,...,fc-l} 
' ^ n,k, — M mod 



(x M+k )\ 

k^M+l) J 



LfJ- 1 



2k + HCnAxlZll)) ^n+l>M+ll 



km+k<M+k 



m=0 



W" 1 , 

2k + P[km + l>M + l,km + k<M + k)e(C n ^xl™l'l)) 
__ n n — k — 1 



< 2k 



ill 



< 2k + - 



n — k — 1 
k 



E[*(C n ,fc(:ffiJ))] ,^V^Unif{0,.. 



kn-k-1 



H k (x n ) + 1 



-1 



Dividing both sides by k and taking the limit supremum in n, we have that 

1 - ~ 2k 1 / - 
limsup ^H k w < — + t (H k (x n ) + 1 

n-i-oo k k \ 



Now taking the limit supremum in fc followed by the limit supremum in k, we have that 

lim sup lim sup — H k w < lim sup lim sup —H k . 

n— >oo k fe— >oo n— s-oo rC 



(20) 
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We can identically show (by replacing all limits supremum with limits infimum) that 

liminf liminf —H*„, < liminf liminf -rH k . 

fc— >oo n— >oo fc 



(21) 



fe->oo fc 

Equations (l20l > and (|2"TT > prove the lemma. ■ 
We now prove the opposite direction: that the sliding window compressibility can be no smaller than the block-by-block 
compressibility. 

Lemma 18: Let x°° be a finite-alphabet sequence. Then 

p(x°°)> lim liminf \h k (x n ) 

— fc— >oo n— >oo k 



and 



p(x°°) > lim ]im.sup-H k (x n ). 

fc— >oo n — K 



Proof: Let fc > fc be two blocklengths, and let C n> k be the optimal fc-block Huffman code for the fcth-order sliding window 
distribution p k w (X k )[x n ]. Then there must exist a phase 9* E {0, . . . , fc — 1} such that 



E 



(22) 



1}. This follows from the within-one -bit optimality of C„ 



where N is distributed uniformly across the set {0, . . . 
over the sliding-window distribution, and because the slfding-window distribution may be expressed as a (nonnegative) linear 
combination of the block-by-block distributions p k (X k )[xg] computed according to each phase 9. 
Define C n ,k,e and C n ,k as the (9,k)- and fc-extensions of C n> fe. 

A familiar sequence of inequalities may then be applied. Allowing M to be uniformly distributed over the set {0, . . . , \n/k\ — 
1}, we have: 



(a) 

H k (x n ) < 



(b) 
< 



(c) 
< 



E 
E 

E 
E 



tic r(xl {M+1) ) 
V ™> fcV kM+1 ' 



mm ' 
ee{o,...,fe-i} 



fc(M+l). 



C n.k.e( x 



1{C 



n,k,(6* —M) mod k 



(x k(M+X) ) 

y kM+1 ' 



2k 



V fir , r T fe(m+1)+e \ 

/ j c ^°™M x fcm+l+0* > 
m=0 



Mk+l<km+9* +1^ Mk+k>mk+G* +k 



m— 

n — i 

k n — 9* — fc 
n-Q* fc 



< 2fc+ ^2 V\Mk + l<km + 6* + l,Mk + k>mk- 

>n=0 

fc J 1 

V fc p(n ,(T k{m+1+d ' ] \\ 

/ j n-e* y [ynM^km+i+e* )) 

fir Jr fe(JV+1)+e *iY 

* y^n.kK^kN+l+e* ) J 

k n-9*-k(^ + 



(d) 

< 2k 



(e) 

< 2fc 

(/) 

< 2fc 



-E 



+ fc ) £ [Cn.kix^^+g, •*) 



Step (a) follows from Shannon's converse and the definition of the fc-block empirical entropy H k (x n ). (b) sets 9 

k(m+l) 



M) mod fc so that every encoded subblock within C ' t. is of the form C n ^k{z 



km+1+0* 



) for some integer m. (c) upper- 



bounds the length of an encoding first by bounding the suffix and prefix at fc bits each, and then by summing the encod- 
ing lengths for each contributing encoded subblock. Observe that an encoded subblock C n ^k(x^^^g d , ^) only appears in 



C ~ ( T k ( M + l )\ 

n,fc,(e*-Af) mod fcA ' 



if x 



k(m+l+e*) 



is fully contained within the fc-block being encoded x~ 



k(M+l) 
kM+1 



(d) follows from 



recognizing that M takes [(n — 9*)/k\ values uniformly, and that at most fc/fc of these positions satisfy the conditions 
Mk + 1 < km + 9* + 1 and Mk + fc > mk + 9* + fc. (e) introduces N as a random variable distributed uniformly over 
{0, . . . , I ^ttM — 1}, and the sum is replaced by an expectation. Finally, (f) is a direct application of d22l . 
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Dividing both sides by k and taking the limit infimum in n, we have that 



1 9k 1 / 

liminf f H k (x n ) < T + - (H*,{x n ) + 1 



Taking the limit infimum in k, followed by the limit infimum in k, we have 



liminf liminf l,H k {x n ) < liminf liminf yH k w (x n ). (23) 

~ k— >oo n— >oo fc 



n^oo 



By repeating these two steps with limits supremum instead of limits infimum, we find that 



lim sup lim sup -H (x n ) < lim sup lim sup - if s „ (x n ) . (24) 



k— >oo n— s-oo ft 



Lemma [T] now follows from lemmas Q~6] [17] and [18] 



Appendix B 
Proof of Lemma[6] 

Proof: We begin by defining a function n.M(M°°) that specifies the truncated source M nM ^ M \ 

1) If M°° i Mkiz 00 ) then select only the first bit n M {M°°) = 1. 

2) Suppose M°° G 7Wfe(z°°). Then let {Li}, {m}, and p(M L ,z k ) be specified so that © is satisfied. 

Let Cm°° be the optimal conditional Huffman code for p(M L \z k ), and let Bm°° be the number of bits required to 
describe Cm=° to a decoder. 

We then choose i(M°°) £ Z + sufficiently large so that the following three conditions are satisfied: 



CI 



E [L]p — E [L]p < 8, where p ni is the empirical distribution of (M , z )"*. This can be satisfied because 



p is a limiting empirical distribution on the points {n^}. 
C2 E [£(C(M L |z fc ))l_ < H p (M L \z k ) + 1. This is possible since we know that E [l{C{M L \z k )j\ _ < 

Hp(M L \z k ) + 1, and that p n< -> p. 
C3 — ^^^^ < (5. This is possible simply because Bm» is finite. 

Armed with i(M°°) we may define the following: 

• n(Af°°) = Tii(M°°) is the relevant index in the partition sequence {(M L , z k )i). 

• nM{M°°) = £j- is the relevant index in the source sequence M°°. 

• n z (M°°) = kn^M°°) is the relevant index in the noise sequence z°° . 

We now describe a variable-length source coding scheme for this constructed source M ™ M< ^ M ) with encoder-side-information 

M°° 

1) If M°° ^ Mfc, directly encode M ^"( M °°\ R eca ll that since 7i M ((X fc ) c ) = 1, this is just the first bit M x . 

2) If M°° £ M.k'- First, specify Cm°° with the first Bm^ bits in the encoding. Then, apply Cm<*> to each of the n(M°°) 
blocks in the ft -partition (M L ,z k )^ M \ 

Call this encoding function F(M nM ^ M '). By Shannon's source coding converse, the expected length of this encoding must 
exceed the entropy of the source M nM ( M °°). We demonstrate that for this to hold Aik must be of measure zero. 
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< E 

(a) 

< E 



l(F{M? M(MCO) ))\ - H{M™ m{m °° ] ) 



£(F(M{ 



n„(M° 



)) - E [n M (M°°)] 



(6) 






-P(M 




P (M°° 


(c) 




< 


P(M°° 


(d) 




< 


P(M°° 






< 


P(M°° 


(/) 




< 


P(M°° 


(s) 




< 


P(M°° 



rL\k\ 



1 Pn(M») 



n M (M°°) | M°° G M k 



(M°° G E [n(M°°) (S + E [^(C M ~ (M £ |z*))] ?B 
(M°° G E [n(M°°) (S + E [^(C M ~ (M £ |;s*))] ?B 



-n M (M°°) | M°°eM k 
- n(M°°)E [% (Moo) | M°° € 



< P (M°° G X fc ) E [n(M°°) (H p (M L \z k ) + 1 - E [£]-) | M°° G M k 



Step (a) holds by Lemma[5] Step (b) follows from an expansion of both expectations, (c) is due to condition C3. The definition 
of nM(M°°) yields (d). Condition C2 implies (e). (f) follows from condition CI. Because S can be arbitrarily small, (g) holds. 
Finally, by the definition of Mk in CK this last inequality can only be satisfied if P(Af°° G M) is zero. ■ 

Appendix C 
Proof of Lemma[T31 

First, some notational conveniences are defined. Suppose the given repetition scheme T is applied to an iid Bernoulli(l/2) 
source M°° and fixed noise sequence z°° . Let {Li} be the number of source bits decoded in each block. Consider the ith 



block 



i.e. ^ I (JWf^0 £ ; +1 ,0-),^f_ 1)JV+1 1. Let 



(i) 



M. 



indicate the source bits used by the encoder, 

2^ = 1 L i + 1 



let Mh = AL/ri 1 _ ^ indicate the estimate produced, let B4 = % indicate the rate of the block, and let zK 



denote the noise of the block. Finally, let u 



'(i-l)JV+l 



iN 
l (i-l)N+ 



1 denote the feedback during the block. 



Observe that the jth block in a repetition scheme can only affect a future block in one way: by adjusting Lj and thereby 
changing MK for i > j. As such, for all i > j, the Markov chain — — {u^y M^ ) holds. The joint distribution of 

(ugj , u$) , Mg) , Afg ) may then be written as 

P (u&.u&.M&.Afft) = p(u^, M^)p(M^\u^, M$M<)|M$). 

Since £Jj and £^ are deterministic functions of (u^yMK) and (uy^Myj) respectively, we may easily introduce them into 
the joint distribution: 



This may be rephrased as 

p (u^u^M^M^E^Ej) = p (u&, M^iS,, M$) p (u^E^M^) . 

Summing over (u^ , , MK ) this yields 

p (E^E^M^) = p (Ej, Af$) P (Ei\M$) = p (EjWfo) p (EilMft) p (m$) 

which proves the Markov relation Ei — MK — E % ~ x . 

The above arguments may be repeated verbatim to show that Tj — MK — T 1 " 1 . 
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